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We address the "major open problem" of evaluating how much increased efficiency in estimation 
is possible using non-separable — as opposed to separable — measurements of N copies of m-level 
quantum systems. First, we study the six cases m — 2, N = 2, . . . , 7 by computing the 3x3 Fisher 
information matrices for the corresponding optimal measurements recently devised by Vidal et al 
(Phys. Rev. A 60, 126 [1999]). We obtain simple polynomial expressions for the ( "Gill-Massar" ) 
traces of the products of the inverse of the quantum Helstrom information matrix and these Fisher 
information matrices. The six traces all have minima of 27V — 1 in the pure state limit — while 
for separable measurements (Phys. Rev. A 61, 042312 [2000]), the traces can equal N, but not 
exceed it. Then, the result of an analysis for m = 3, TV = 2 leads us to conjecture that for optimal 
measurements for all m and N, the Gill-Massar trace achieves a minimum of (2N — l)(m — 1) in 
the pure state limit. 
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I. INTRODUCTION 



We investigate information-theoretic properties of the optimal measurement schemes recently devised by Vidal et al 
|EJ, helping thereby to address the "major open problem" Q of evaluating how much increased efficiency in estimation 
is possible using non-separable measurements (cf. S). In their extensive study, "State estimation for large ensembles," 
which we seek to extend here, Gill and Massar stated that "we cannot compare our results with the recent analysis of 
covariant [optimal] measurements on mixed states |lj because we suppose separability of the measurement, whereas Q| 
does not" A "separable measurement is one that can be carried out sequentially on separate particles, where the 
measurement on one particle at any stage (and indeed which particle to measure: one is allowed to measure particles 
several times) can depend arbitrarily on the outcomes so far" ^ . 

The analyses here are conducted in terms of the (classical) Fisher information (of the probability distributions 
associated with the non-separable measurements), making use of the quantum (Helstrom) Cramer-Rao bound on 
the Fisher information matrix for any oprom (operator- valued probability measure) Contrastingly, the studies 

of Vidal and his several Barcelona colleagues Jl|,@-§| have been formulated primarily in terms of fidelity, F(p, p') (p 
and p' being density matrices) 0,0, and secondarily, information gain [Q. Now, there surely exists an intimate 
connection between these approaches, since 2(1 — F(p,p')) functions as the Bures distance between p and p'. The 
Bures metric is a distinguished member (the minimal one) of a continuum of possible quantum extensions — each 
associated with a distinct operator monotone function — of the (classical) Fisher information metric The 
Hclstrom-Cramer-Rao bound corresponds to the particular use of the Bures metric via the concept of the symmetric 
logarithmic derivative [Q . An interesting hypothesis is that asymptotically the Fisher information matrix for optimal 
measurements is simply proportional to the metric tensor associated with some specific operator monotone function. 
(Our results below indicate that such a role is definitely not played by the Bures metric.) 



We shall be concerned here primarily (cf. sees. Ill D 2 and III D 4) with the two-level quantum systems, representable 
by the 2x2 density matrices, 

p=-( 1 + Z X + ' ly ), (1) 
H 2 \x - ly 1 - z J ' w 

where r 2 = x 2 + y 2 + z 2 < 1. The particular [x,y,z) parameterization employed in (|l|) corresponds to the use of 
Cartesian coordinates for the "Bloch (or Poincare) sphere" (unit ball in three-space) representation of the two-level 
systems |l5| |l6], sec. 4.2], while the alternative (spherical coordinate) parameter r is the radial distance from the 
origin. Pure states, for which \p\ — 0, correspond to r = 1 and the fully mixed state, for which |p| = ^, to r = 0. 
For the cases of N copies (N — 2, ... ,7) of a two- level quantum system (Q) we obtain below in sec. Ill C a 



quite interesting pattern of results of increased efficiency using non-separable measurements, which strongly suggests 
generalizability to arbitrary TV. To explicitly examine the cases N > 7 would cither entail considerable additional 



computations for each specific N and/or substantial analytical advances (cf. sec. IV C ) allowing one to formally 



establish the measure of increased efficiency for arbitrary N. (We note that Latorre et al |8|] had to proceed cas e-by 



case, that is, each N individually, since they "did not know how to build the POVM algorithmically" .) In sec. IV C 
we explore one possible approach in this regard, attempting to explain the Fisher information matrices we compute 
in sec. 



[II in terms of monotone metrics. In sec. [Ill] , we also formulate a conjecture as to the increase in efficiency 
achieveable using non-separable optimal measurements for N copies of m-level quantum systems in general. 

To begin our study, immediately below in sec. ||, we expand upon an observation |l7], p. 2684] regarding an 
information-theoretic relationship between certain classical and quantum entities — that is, the Fisher information 
matrix for a certain (quadrinomial) multinomial probability distribution and the quantum Helstrom information 
matrix (proportional to the Bures metric tensor), and its implications for optimal measurements. 

In sec. |^ we examine further ramifications on issues of state estimation and universal coding (data compression) 
]T8| pTj . There appears to be an interesting relation between the devising of optimal measurements as in and 
universal quantum coding, as both processes involve averaging with respect to isotropic prior probability distributions 
by "projecting onto total spin eigenspaces, and within each such subspace, onto total spin eigenstates with maximal 
total spin component in some direction" Q] — cf. eqs. (5.33) and (5.34)] and |l^, eq. (2.48)]. The particular prior 
distribution which yields both the minimax and maximin for the universal quantum coding of the two-level systems 



is based on the quasi-Bures metric, a particular example of a monotone metric. We attempt in sec. IV C to relate the 



Fisher information matrices we compute in sec. Ill to the monotone metrics 
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II. PROPORTIONALITY BETWEEN HELSTROM AND FISHER INFORMATION MATRICES 



The density matrices (|l|) turn out to have an intimate relationship with a particular form of multinomial (that is, 
quadrinomial) probability distributions — the four distinct possible outcomes being assigned probabilities 

2 2 2 i 2 2 2 /r>\ 

x , y , z , 1 - x - y - z . (2) 

One can attach to the three-dimensional convex set of two-level quantum systems (|l|), adapting one (the simplest) of 
the "explicit" formulas of Dittmann [§|, eq. (3.7)] iff], 

d B ures{p, P + dp) 2 = ^Tr{dpdp + -^(d/j - pdp)(dp - pdp)}, (3) 
4 |p| 



the 3x3 quantum (Helstrom) information matrix p|,|2],p4j (that is, four times the Bures metric tensor 23 25|j2^ , p^[ ] ) , 

1 - y 2 - z 2 

H q (x,y,z) = - — "— — I xy 1 - x 2 - z 2 yz I. (4) 



(1 — x 2 — y 2 — z 



xz 




We use the subscripts q and c — in a suggestive, perhaps not fully rigorous manner — to denote results stemming 
from quantum or classical considerations. Also, note that (Q) "blows up" at the pure states themselves — so it will 
be problematical, at best, to directly compare results pertaining to (^) with ones based on pure state models p|,p7|. 
In spherical coordinates (r, 8, </>), x = rcosO, y = r sin 8 cos cf), z = rsin0sin</>, the matrix (0) takes a diagonal form, 



H q (r,e,cf>)= \ r 2 , (5) 




for this orthogonal system of coordinates (cf. ]28|]). (Below, in the interest of succinctness, we will replace the 
frequently-occurring expression x 2 + y 2 + z 2 by its equivalent, r 2 .) 

Now, the quantum information matrices (^) and (|^) are simply proportional to the (classical) Fisher information p{J] 
matrices I c (x, y, z) and Z c (r, 8, 4>) for the quadrinomial probability distribution (|J). (By way of algorithmic example, 
the xy-entry of the 3x3 Fisher information matrix — in its Cartesian coordinate form, I c (x 7 y,z) — is computable 
as the expected value of the [two-fold] product of the logarithmic derivatives of (0) with respect to x and with respect 
to y.) More precisely, the nine entries of I c (x, y, z) are all four times the corresponding entries of (||), that is 

I c {x,y,z) = 4H q (x,y,z). (6) 

A natural explanation for this phenomenon is that the information geometry |30| of both models is that of the standard 
metric on the surface of a three-sphere in four-dimensional Euclidean space [ |l3[|3l| ] . 

Both quantum (Helstrom) information and Fisher information possess the property of additivity, that is, for N 
independent identical density matrices or probability distributions, the information matrices (possibly scalars) are N 
times those for a single one || exer. 1.10] ||, sec. VI. 4] [32 35 1. 



By the quantum version of the Cramer-Rao theorem the inverse matrix H q (x, y, z)~ x serves as a lower bound 
i the variance-covariance m 
lat the matrix difference, \ 
nonnegative.) In this regard, 



on the variance-covariance matrix V(x, y, z) for any unbiased estimator of the parameters (x, y, z) of p. (This means 
that the matrix difference, V(x, y, z) — H q {x,y, z)~ l , must be nonnegative definite, that is, have all its eigenvalues 




—xy —xz 

H q (x,y,zy 1 =\ -xy 1 - y 2 -yz | (7) 

—yz 1 — z 2 

(Of course, H q (r, 8, 0) _1 is diagonal.) 

By dint of the additivity of information, in conjunction with the Cramer-Rao theorem (cf. eq. (26)]), one can 
conclude that it is not possible to devise for N < 4 independent identical two-level systems, an oprom ^|J6), which 
has for its outcomes the quadrinomial distribution (j^) (cf. |l|,^6|). (When we attempted to construct such an oprom 
for the case N = 2, we found that the four operators could not all be nonnegative definite if they were to yield (0).) 
However, for N > 4, the question of whether such an oprom exists would appear to be a completely open one — since 
now the Cramer-Rao theorem does not rule out its possibility. (The results of Vidal et al JjJ show that an optimal 
minimal number of measurements for N > 3 is at least fifteen, exceeding the number four for an oprom that would 
give as its outcomes, the quadrinomial probability distribution (0).) If such an oprom could be found for N = 4 itself, 
then the Cramer-Rao inequality would be fully saturated. 
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III. ANALYSES OF OPTIMAL MEASUREMENTS OF VIDAL ET AL FOR TV COPIES OF TWO-LEVEL 

QUANTUM SYSTEMS 



A. Computation of the Fisher Information Matrices 

1. TV = 2 



Let us now consider the probability distribution in |l| obtained from the optimal minimal number (five) of mea- 
surements for the case of N — 2 identical independent copies of the two- level systems (Q). The five probabilities — 
as we have explicitly found — can be written as (the three) 



together with the pair 



1(1 -r 2 ), y Q {l + z)\ ±(8x 2 -4V2x(z-3) + (z-3n (8) 



— (9 + 2x 2 ± 4V3xy + 6y 2 + 2V2(x ± V3y){z - 3) - 6z + z 2 ). 
48 



Quite remarkably, the associated Fisher information matrix (J c ) turns out to precisely equal the quantum (Helstrom) 
information matrix, H q (x, y, z) — and not 2H q (x 1 y, z), which is the upper bound furnished by the quantum Cramer- 
Rao theorem. So, the bound could be said to be "half-saturated". (In regard to this specific result, R. Gill has 
observed that there may exist other measurement schemes which are sub- optimal accoding to the fidelity criterion of 
0, but superior in terms of Fisher information (cf. 0).) 

2. TV = 3 

For an optimal minimal set of measurements for TV = 3, we can take the eight probabilities, consisting of the four 
pairs, 

The associated Fisher information matrix is expressible as 

"' ( »»' )+ 2«, + ,+,r-«) (j ■ \y <10 » 

where a — 2(1 — xy — xz — yz) and b = —1 + r 2 . The second summand in ( |l0| ) is negative definite (having two of 
its three negative eigenvalues equal to while 3H q (x,y, z) is the upper bound on the Fisher information matrix 
provided by the Cramer-Rao theorem. 

3. TV = 4 

An optimal minimal set of measurements for TV = 4 yields a fifteen- vector of probabilities. The Fisher information 
matrix for this probability distribution is 

5xy 5xz 

3H q {x, y,z) + ^ | 5xy -7 - 5x 2 - 5z 2 5yz \ . ( I I ! 

hyz —7~5x 2 ~5y 2 




12 



The second term is negative definite with one eigenvalue equal to — ^ and the other two, — ^(7 + 5r 2 ). If we subtract 
( pi] ) from the Cramer-Rao upper bound AH q (x,y, z), we obtain (as we must) a nonnegative definite matrix, having 
two eigenvalues yj(19 + 5r 2 ) and one, ^ + jz^z- 
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I N = 5 



For N — 5, a twenty- vector of probabilities was obtained for the optimal minimal number of measurements. The 
Fisher information matrix can be expressed as the sum of AH q (x, y, z) (which dominates it, while 3H q (x,y, z) does 
not) and a negative definite matrix, having one of its three negative eigenvalues equal to — ^(5 + 3r 2 ). This negative 
definite matrix can be written as the product of tst — Q , z 1 — ; — vi\ and a 3 x 3 matrix, the (1, 1) cell of which is 

- 2(-20 + 7/ + 9y 3 z - llz 2 + 7z 4 - 5x 3 (y + z) + 3yz(5 + 3z 2 ) + (12) 

3x(y + z)(h + 3y 2 + 3z 2 ) + x 2 (10 + 7y 2 - 5yz + 7z 2 ) + y 2 {-ll + Uz 2 )) 
and the (1, 2) off-diagonal entry is 

- 5x 4 + Ux 3 y + 2x 2 (5 + 9y 2 + Uyz - bz 2 ) - 5(-l + y 2 + z 2 ) 2 + \Axy{-3 + {y + zf). (13) 
The remaining cells are obtainable by simple symmetry arguments (for example, the (2,2) cell can be gotten by 



interchanging x and y in (12)) 



5. N = 6 

For N — 6, we used an optimal (but not minimal) set of thirty-three measurements. We found — using a large 
number of randomly generated points (x, y, z) — that the associated Fisher information matrix was strictly dominated 
by 5H q (x, y, z), but not by A.99H q (x, y, z). The Fisher information matrix takes the form (cf. (|ll|)) 

^ I a Axy Axz \ 
5H g (x,y,z) + —l Axy b Ayz , (14) 
\ Axz Ayz c J 

where 

A = 193 - 31r 2 , a = -125 - 146y 2 - 146z 2 + 31(y 2 + z 2 ) 2 + x 2 (47 + 31y 2 + 31z 2 ), (15) 

and the diagonal entry b can be obtained from a by interchanging x and y, and c from a by interchanging x and z. 

One of the three negative eigenvalues of the second ("residual") matrix in ( fl4"| ) is (125 — 172r 2 + 47r 4 )/(120(— 1+r 2 )). 
Now, if we were to rewrite ( |l4| ) in the form of A.99H q {x, y, z) plus a slightly revised residual matrix, the eigenvalue 
in question would be altered only in the respect that the constant 125 would change to 123.8. This would render it 
positive for r > .992348, leading to a loss of strict dominance for r G [.992348, 1]. In this specific sense, the upper 
bound of 5H q (x,y, z) on the Fisher information matrix is tight. The residual matrix for N = 4 strictly dominates 
that for N = 6. This indicates that the "fit" of (N — l)H q (x, y, z) to the Fisher information matrix for optimal 
measurements of N copies improves as N increases. 



6. N = 7 



For TV = 7, employing a 42-vector of probabilities, we found the Fisher information matrix to be strictly dominated 
by 6H q (x,y, z), but not by 5.99H q (x, y, z). Reviewing our previous analyses, we then found that the analogous 
situation held also for N = 3, . . . , 6, that is, the Fisher information matrix was dominated by (N — l)H q (x, y, z), but 
not by (N — l.01)H q (x, y, z). The violations of these diminished bounds occur for nearly pure states, that is r w 1. 

Pursuing this line of thought, if we restrict consideration to the more mixed states for which r < ^, then for N = 7 
we have found that 3.9H q (x, y, z), but not 3.85H q (x,y, z) bounds the Fisher information matrix for the optimal set 
of measurements. Calculations suggest the hypothesis that in the neighborhood of the fully mixed state r = 0, the 
bound on the Fisher information matrices approaches from above ./Vi? 9 (0, 0, 0)/2, that is ^ times the 3x3 identity 
matrix. Now, the fully mixed state is classical (binomial) in character, while the pure states are quantum in nature. 
(It is interesting to note that Frieden finds that in classical scenarios, only one-half of the bound or phenomenological 
information J is utilized in the intrinsic quantum information / |2^, eqs. (5.39), (6.55)]. "In all covariant quantum 
theories (e. g., quantum mechanics, quantum gravity) / and J are exactly equal. In deterministic classical theories 
such as classical electromagnetics and general relativity / = J/2. But in statistical classical theories I = J again" 
[e-mail message from Frieden].) 
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7. N > 7 



We are not able to proceed any further, that is for N > 7, as there presently do not appear to be corresponding 
sets of optimal measurements. As a caveat to the reader, let us point out that to recreate the optimal measurements 
for the cases N — 6 and 7 (which unlike the instances N < 6, were not formally demonstrated to be minimal in 
character), it is necessary to rely upon the quant-ph preprint version (9803066) of since there are certain errors 
(as confirmed in an e-mail from R. Tarrach, though no formal erratum has appeared) in the final, published paper. 



B. Properties of the Computed Fisher Information Matrices 

1. Diagonal nature for even N in spherical coordinates 

We have found that the Fisher information matrices given above for the optimal measuements of Vidal et al Q for 
both N — 4 and 6 are diagonal in spherical coordinates (r, 9, </>). For N = 4, this is 



29+7r z 




r 2 (29-5r 2 ) J- (16) 

r 2 (29- 5r 2 )sin 2 



and for N = 6, 




475+172r^-47r 



1-r 





r 2 (475 - 146r 2 + 31r 4 ) J (17) 

r 2 (475 - 146r 2 + 31r 4 ) sin 2 , 



For N = 2, we also have a corresponding diagonal matrix, that is, feh 



Cox and Reid |37, p. 2] have listed three "consequences of orthogonality" of the parameterization of a Fisher 
information matrix, such as we have just observed. These are that: (i) the maximum likelihood estimates of the means 
of the parameters are asymptotically independent; (ii) the asymptotic standard error for estimating one parameter is 
the same whether the other parameters are treated as known and unknown; and (iii) there may be simplifications in 
the numerical determination of the means of the parameters. "While orthogonality can always be achieved locally, 
global orthogonality is possible only in special cases" |37|, p. 2]. In accompanying discussions to pq , Sweeting 
identifies four advantages to orthogonalization — computation, approximation, interpretation, and elimination of 
nuisance parameters — while Barndorff-Nielsen, as well as Moolgavkar and Prentice, explain parameter orthogonality 
in terms of Frobenius' Theorem. The latter authors also indicate that the theorem of de Rham [[38[ p. 187] gives 
necessary and sufficient conditions for each orthogonal parameter to be independent of the others (as they are not in 
our three even-dimensional examples just given). 



2. Pure- and fully mixed state limits 

Again using spherical coordinates, it is interesting to note that for the odd cases of N — 3, 5, 7, in the pure state 
limit (r — > 1), the off-diagonal elements of the corresponding 3x3 Fisher information matrix converge to zero. In all 
six (both odd and even) cases, in this same limit, the (l,l)-entries are indeterminate, the (2,2)-entries are $ and the 
(3,3)-entries are Nsi f e . 

For the fully mixed state, r = (allowing the angular variables 9 and <fi to remain free), the only non-zero entry is 
the (l,l)-cell. For N = 2 it is 1, for N = 3 it is 

-(10 + sin26>(cos</> + sin</)) + sin 2 6» sin 20), (18) 
6 

for N = 4 it is §, for N = 5, it is (1O3+ 3 2 cos20) , for N = 6 it is ||, and for N = 7, 

— (456 cos 2 9 + 7 sin 26>(cos <j> + sin </)) + sin 0(456 + 7sin2</>)). (19) 
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3. Integrals over Block sphere of volume elements 



For N — 2, the integral of the volume element of the Fisher information matrix (that is, the square root of the 
determinant) over the (Bloch sphere of) two-level quantum systems is ir 2 sw 9.8696, for N = 3 it is 21.0235, for N = 4, 
it is 



!*(4705 E (-1) 



U94K(-— )) « 35.0281 



(20) 



(where E and K denote the corresponding elliptic integrals), for N = 5, it is 51.0763, for TV = 6, it is 69.1253, and for 
N = 7, 88.8621. These particular results would be needed for the application to the optimal measurements of Vidal 
et al fit] of the universal coding theorem of Clarke and Barron Jig], discussed below in sec. IV A. 



C. Gill-Massar Traces 

Let us first observe that Gill and Massar eq.(26)] asserted that the upper (quantum [Helstrom] Cramer-Rao) 
bound NH q , was not, in general, achievable in a multiparameter setting. This does appear to be strictly the case. 
However, our results for N = 2, . . . , 7 for the three-parameter 2x2 density matrices, indicate that — using the optimal 
measurements of Vidal et al Q — one can, by choosing N large enough, come indefinitely close for the nearly pure 
states to this bound. 

To further relate to these analyses of Gill and Massar, we have computed for TV = 2, . . . , 7, the traces of the product 
of H q (x,y, z)" 1 , given in ((?]), and the Fisher information matrices we have obtained using the optimal measurements of 
Vidal et al. (The traces of Fisher information matrices play a central role in the work of Frieden on the fundamental 
equations of physics |29[ sec. 2.3.2].) For the estimation of pure states, Theorem I in || asserts that this trace 
quantity is bounded above by N, while Theorem II there says that the same bound applies to mixed states, with the 
restriction to separable measurements. It is also demonstrated there that these bounds are attainable — and for large 
N simultaneously for all states. 

For TV = 2, it is easy to see, in the context of the results above, that this ("Gill-Massar") trace result is simply 3. 
For N = 3, we get another constant, 5, for the trace. For N = 4, we obtain 

29 — r 2 

GMi = , (21) 

which is 7 for pure states and 7.25 for the fully mixed state. For N = 5, the Gill-Massar trace is 

19 - r 2 

GM 5 = (22) 

which is 9 for pure states and 9.5 for the fully mixed state. For N = 6, it is 

95 _ «r 2 4- r 4 

GM 6 = g +r . (23) 

This last expression is monotonically decreasing from ^ = 11.875 at r = to 11, that is, 2N — 1 at r = 1. For N = 7, 
the Gill-Massar trace is 

GM 7 = 57 - 6r * +r \ (24) 



which equals ^ = 14.25 at r = and 13 at r = 1, being again 2N — 1. (In an earlier version of this paper, quant-| 



ph/0002063| , the results given — including Fig. 1, plotting the Gill-Massar trace — for N = 7 were "anomalous", in 



this regard. We subsequently ascertained that they were erroneous in nature, due to a programming error.) In Fig. Ffl, 

GM N 
(2JV-1) 



we plot |fe for N = 4, 5, 6 and 7 
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Scaled GM-traces 




1 . 



1 .06 



1 . 04 



1 . 02 



0.2 0.4 0.6 0.8 1 

FIG. 1. Gill-Massar traces for N = 4,5,6 and 7 scaled by their values at the pure states, r = 1, that is, 2iV — 1. 
y-intercepts for r = 0, corresponding to the fully mixed state, increase with N. 



The 



It is easy to see, then, that in these six cases the Gill-Massar bound [g, eq. (27)] of N is violated — as Theorem 
III of their paper recognizes will occur for non-separable measurements. So, we obtain a simple pattern of 2N — 1 for 
the minimum of the trace quantity in question. In regards to these results, R. Gill remarked in an e-mail message of 
Feb. 18, 2000 that "this is all very interesting. It means that there is a big discontinuity at the surface of the Bloch 
sphere (where none of these 3x3 Fisher information matrices is well-defined), and it means that the gain in using 
joint measurements over separate measurements for mixed states is substantial throughout the Bloch sphere". 



D. Analyses for m-Level Pure States 



In a further effort to relate to the analyses of Gill and Massar j| , let us consider for the moment simply the two-level 
pure states, so we set r = 1. In terms of the polar coordinates (0, 0), the Helstrom information matrix takes the form 
(cf. §), g, p. 4238]) 



1 







o shr 



(25) 



Then, the Fisher information matrix for the optimal measurements of N copies g is simply times (|5|), as we have 
confirmed through computations for N = 2, . . . , 7 (cf. Q). (So, in the pure state case, unlike the mixed state one, 
the quantum Cramer-Rao bound of N times ( p5| ) is not asymptotically approached — though the Gill-Massar trace 
bound of N is achievable.) 



2. m = 3 



We have also verfied that the same basic additive relation holds in the case of the i/iree-level pure states for N = 2, 
using the formulas in M . Let us use the parameterization of these states in terms of four angular variables (9, <f>, XuXi) 



employed in 41, eq. (2.1)], 



\ip) = e lxi sin(9cos(/>|l) + e lx2 sin6»sin0|2) +cos6»|3). 
Then, the Helstrom information matrix is 



(A 

4 sin 2 i 



Vo 



sin 4 9 sin 2 20 






sin 4 9 sin 2 20 



(26) 



(27) 



where (cf. E§) 
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a= -(6 + 2cos26» + cos2(6 ) -0) - 2 cos 20 + cos 2(0 + 0)) sin 2 6>cos 2 0, (28) 



b = --(-6 - 2cos6> + cos2(6>- 0) - 2 cos 20 + cos 2(6 + 0)) sin 2 6 sin 2 0. 

(Note that (|27| ) is free of the variables, xi an d X2 — as (|J) is free of 0.) So, for N — 2 copies of a spin-1 system, 
the Fisher information matrix is identically (p7j), paralleling the specific results for both the pure and mixed two- level 
quantum systems for N — 2. We also intend to analyze the case N = 3, using the specific prescription for the 
corresponding optimal measurements in [[)[ sec. 6]. 



3. supplementary analysis for 3-level mixed states 

We have attempted — following the general methodology laid out by Vidal et al Q| for the two-level mixed quantum 
systems — to construct an optimal measurement scheme for N = 2 copies of mixed t/iree-level systems. In doing so, 
we incorporated the optimal measurements for N = 2 copies of pure three-level quantum systems presented by Acin, 
Latorre and Pascual in sec. 5], that were utilized immediately above. (J. Latorre informs me that he and his 
co-authors "did not find any manageable way to make progress" in such extended m = 3 mixed cases, although he 
did point out that Arvind had recast and further developed many of their results using Penrose rays — in apparently 
yet unpublished work.) This led us to an oprom with twelve distinct outcomes, nine corresponding to the vectors 
explicitly presented in eqs. (39), (40)], and the additional three coming from our own orthogonal decomposition 
of the associated rank three "residual" projector (cf. Jl], eq. (3.3)]). (A weight of | was applied to the subset of nine 
outcomes.) 

With this twelve-outcome oprom in hand, we found by numerical means that the Gill-Massar trace equalled a 



constant, 6 (while for N — 2 copies of too-level systems this trace quantity was found in sec. Ill C also to be a constant, 
3) . (In |12| , we have been investigating the possibility of symbolically inverting the 8x8 Helstrom information matrix 
— making use of a recently-developed Euler angle parameterization of the 3x3 density matrices . The Gill-Massar 
trace would, of course, be the trace of the product of this inverse matrix and the Fisher information matrix associated 
with the twelve-outcome oprom.) This result and our earlier ones for m = 2, JV = 2, . . . , 7, lead us to conjecture that 
for non-separable optimal measurements of N m-level quantum systems, the Gill-Massar trace for all m and N is 
exactly (2N — l)(m— 1) in the pure state limit, and no less than this for any mixed state. 

Now, for any measurement of a strictly pure state itself, the Gill-Massar trace can not exceed N(m — 1) by Theorem 
I of Q . (This bound is known to be achieveable for m — 2 by Theorem VII of , and for mixed states using separable 
measurements by Theorem VI.) So there is a clear discontinuity displayed by non-separable optimal measurements 
near the pure state boundary, as well as considerable increased efficiency in estimating strictly mixed or impure states 
through the use of such measurements. 



4. m = 4 



We have ascertained the Helstrom information matrix for pure states of four-level systems, making use of the 
appropriate analogue of the parameterization ( |2^ ) presented in p4| , eq. (13)]. The six parameters naturally divide 
into two sets of three, and once again the entries of the Helstrom information matrix are free of the (three) members 
of one of the two sets. 



IV. UNIVERSAL CODING 



We can also apply to the three-dimensional family o f c 
(classical) asymptotic results of Clarke and Barron 



uadrinomial probability distributions (|2|) certain important 
pertaining to a number of problems, including those of 



universal data compression and density estimation. Then, we can compare their formulas with those for the 2x2 
density matrices (Q) , based on the extension to the quantum domain of two-level systems by Krattenthaler and Slater 
fl9| , [20f of this work of Clarke and Barron (cf. jy]). (In what follows, we will denote probability distributions of a 
general nature by w and more specific ones by W, and subscript them — as noted before — by either c or q to denote 
a result stemming from an analysis in the classical or quantum domain.) 
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A. Classical results of Clarke and Barron 



Clarke and Barron examined the relative entropy (N — ► oo) between a true density function and a joint ("Bayesian") 
density function for a sequence of N random variables taken to be the average of the possible densities (comprising 
a parameterized family) with respect to a (prior) probability distribution over this family of density functions. The 
result of Clarke and Barron for the asymptotic relative entropy (Kullback-Leibler index) between the true density 
and the mixture is 

where a denotes the d- vector of variables parameterizing the family of densities, w c (a) a prior probability distribution 
used to average the AT- fold products of independent identical density functions, and / c (a) the associated d x d Fisher 
information matrix. As applied to our particular three-parameter (d = 3) family of quadrinomial distributions (Q), 
with a = (r, 9, 0), we have 

fi4 

|/ c (r, 0,0)| = (-^)r 4 sin 2 fl. (30) 
1 — r z 

Then, if we choose for the probability distribution, w c (a), the particular one 



W c (r, 9, 0) = (— -= =)r 2 sin 9 cx y/\I c (r,0,<l>)\, (31) 



*7T 2 Vl — r 2 

the asymptotic relative entropy between the true density and its Bayesian (mixture) average assumes the form 
cq. (1.4)] 

3 N 

2 l0g 2^ +l0g87r2 + 0(1) ' (32) 

(Let us note that r 2 sin 9drd9d(f) is the Jacobian determinant of the transformation from Cartesian to spherical 
coordinates or, equivalently, the volume element in spherical coordinates.) Our particular selection of W c (r, 6, </>) is 
"Jeffreys' prior" for this case, that is the normalized (over the Bloch sphere) form of the volume element (y/\I c (r, 9,<p)\) 
of the Fisher information metric (cf. sec. [II B 3[ ). (The normalization factor, 87r 2 , is evident in ([32])). Jeffreys' priors, 



as shown by Clarke and Barron |l8j| , fulfill the desideratum of yielding the common minimax and maximin of the 
asymptotic relative entropy. In the quantum analogue, though, (|T]) does not play this distinguished role, although a 
close ( "quasi-Bures" ) relative of it does 20 . ij| . This probability distribution is 



W q (r,9,(f>) = .0832258— ^-(^ — ^)^r 2 sin6». (33) 



B. Quantum Results of Krattenthaler and Slater for Two-Level Systems 

Krattenthaler and Slater (l^j2^] have sought to extend the general results of Clarke and Barron to the two-level 
quantum systems (0) . They averaged the A^-fold tensor products of identical 2x2 density matrices (|]) (rather than 
averaging the simple products of A^ random variables) with respect to (spherically-symmetric/unitarily-invariant) 
probability distributions distributions of the form w q (r)r 2 sin 9 (cf. |l], eq. (1.4)]). The analogue (in terms of the 
quantum relative [von Neumann] entropy) of the Clarke-Barron result (|29| ) is then (d = 3) 



~ log — + 5 log I,(r) - \ogw q (r) + o(l), (84) 



where (cf. ©) 

c 2 , 1 — r 



W = (i^(lT7 )r - (35) 



So, 
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I q (r)r 4 sin 2 9 = 144.372VF g (r, 9, , 



(36) 



which can be compared with its classical counterpart, 

\I c (r,8,^)\ = 647r 4 W c M,</>) 2 , 



(37) 



where 64tt 4 w 6234.18. 

As noted f20|] , the quasi-Bures probability distribution, W q (r 7 6, <fr), given by (|33]), fulfills in the quantum domain 
of two- level systems (|l|) , the distinguished role — in yielding the common asymptotic minimax and maximin — of 
the Jeffreys' prior (that is, the volume element of the Fisher information metric) in the classical sector. In Fig. || 
we plot the term | log/ 9 (r), present in (|34|), along with the comparable (but always larger for r < 1) classical term, 
\ l°g IZ721 m (30)- The units of the vertical axis are, then, "nats" of information. ( A nat is equal to 1/ log e 2 w 1.4427 



bits.) So, in the example above, one achieves a lower relative entropy (redundancy) by proceeding in the quantum 
domain, as opposed to the classical one. 



nats 
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0.2 0.4 0.6 0.8 1 

FIG. 2. Quantum asymptotic relative entropy term — | log/ ? (r) — and its larger classical counterpart, | log t^ti ; plotted 
against radial distance (r) in the Bloch sphere of two-level systems 



In the case r = (the fully mixed state), the quantum (Krattenthaler/Slater) asymptotics is given by the expression 

(38) 



3 N 

-log— -log Wg (0)+o(l). 



For a pure state (r = 1), in the case that w q (r) is continuous and nonzero at r — 1, the asymptotics is given, in 
general, by Q 

21ogAT-31og2-log7r-logu>, ; (l) + o(l). (39) 
However, for the particular case of the Jeffreys' prior (|3l|), which is singular at r = 1, we have |h| eq. (2.53)] 

(40) 



^logAT+ilog^-21og2. 



It would be of interest to ascertain if one can construct a probability distribution for which the (classical) Fisher 
information matrix is equal (in spherical coordinates) to [ f[2"| eq. (3.17)] 

I q uasi—Bures (^3 9 , 0) — 

V 

where s = and g(s) = es 1 



»~ 2 3(s) 
1+r 







r 2 g(s) sin 2 
1+^ 



(41) 



(If we employ g(s) = j- 2 -^ in (|4l|), we obtain the Helstrom information matrix 
H q (r,9,(f>) This would yield the quantum (but non-Helstrom) information matrix, the square root of the 

determinant of which is proportional to the quasi-Bures probability distribution (|33|). This probability distribution 
(rather than (pl|), as originally conjectured |l9|]) has been shown to yield the common minimax and maximin in the 
universal coding of the two-level quantum systems [M . 
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C. Relations between Monotone Metrics and the Fisher Information Matrices Computed in Sec. 



[II A 



It would be of considerable interest to determine the precise nature TV — > oo of the Fisher information matrices 
corresponding to the use of optimal measurements ("For the case of mixed states of spin 1/2 particles, or for 
higher spins we do not know what the 'outer' boundary of the set of (rescaled) achievable Fisher information matrices 
based on arbitrary (non separable) measurements of TV systems looks like. We have some indications about the shape 
of this set. .. and we know that it is convex and compact" (2[ p. 19].) In particular, we would like to ascertain 
whether or not there is convergence in form (to a diagonal matrix in spherical coordinates) between even and odd 
values of TV, as numerical evidence indicates, and whether or not the Fisher information matrices are asymptotically 
simply proportional to some specific mem ber (p4"| ) of a broad class of natural metric tensors (which includes the Burcs 



and quasi-Bures metrics discussed in Sec. IV B) for the quantum states associated with operator monotone functions 



1. The (2,2)- and (3,3)-entries of the diagonal Fisher information matrices for even TV 



In fact, if we equate the (2,2)-entries of the diagonal Fisher information matrices given in sec. Ill B l| for the optimal 
measurements for TV — 4 and TV = 6 to the (2,2)-cell of TV times the general matrix (|4l| ) and solve for g(s), recalling 



that s = tt-, we obtain for TV = 4, 



l 



6(1 + s) 3 



(6 + 17s + 6s 2 ) 



and for TV = 6, 



i 



-(45 + 222s + 416s 2 + 222s 3 + 45s 4 



(42) 



(43) 



45(1 + s) 

Both these symmetry-exhibiting functions, ( [42] ) and (ff3|), as well as the corresponding (Bures/minimal monotone) 
result (the equation of a hyperbola) for TV = 2, that is, 

1 



i 



(44) 



are monotonically-decreasing on the positive real axis (Fig. |3|), but we are presently not aware (for the cases TV = 4 
and 6, that is) if the reciprocals, /(s) = l/<?(s), are operator monotone functions, as required for membership in the 
class of monotone metrics of Petz and Sudar Q] p9[ ]. (A function /(s), mapping the nonnegative real axis to itself, 
is called operator monotone if the relation < K < H implies < f{K) < f(H) for all matrices K and H of any 
order. The relation K < H implies that all the eigenvalues of H — K are nonnegative.) 



. 8 



. 6 



0.4 



0.2 




10 



2 4 6 8 

FIG. 3. Monotonically-decreasing functions g(s), that is (^), (^) and (pli]), obtained by equating the (2,2)-entries of the 
computed Fisher information matrices ([l6|), (^) and (|H|) for TV = 4, 6 and 2, respectively, with TV times the (2,2)-entry of the 
general matrix for a monotone metric. The curve for TV = 6 dominates that for TV = 4, which in turn dominates the 
hyperbola for TV = 2. 
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If we were to include in Fig. |3J the corresponding function for the quasi-Bures monotone metric, that is 

9(«) = ^ (45) 

it would be essentially indistinguishable from the hyperbola for N — 2 (corresponding to the Burcs / minimal monotone 
metric) . 



2. The (l,l)-entries of the diagonal Fisher information matrices for even N 

If, pursuing these lines of thought, one could develop a formula for arbitrary (even) N for the (2,2)-entry of the 
Fisher information matrix for optimal measurements, and obviously easily then for the (3,3)-entry (which would be 
the (2,2)-entry multiplied by sin 2 6), the remaining question, of course, would be to obtain a general formula for the 
(l,l)-entry. In this regard, the apparent general result (established above for N = 2, . . . , 7) that the Gill-Massar trace 
is 2N — 1 in the pure state limit might prove helpful. But since the (l,l)-entry of the metric tensor for any monotone 
metric ( fi"l| ) is always simply ^ 3 , it would apparently be necessary to have some asymptotic convergence to this 
expression, being that the results in the computed Fisher information matrices ( |l6| ) and ( |i"7| ) for N = 4 and 6 (and 
presumably for arbitrary even N) contain polynomials in r in their numerators, and not simply a constant term. In 
Fig. H we plot the (l,l)-entries divided by N of the computed Fisher information matrices, in spherical coordinates, 
for N = 2, 4 and 6. 



Scaled ( 1 , 1 ) -entries 

50 
40 
30 
20 
10 




0.2 0.4 0.6 0.8 1 
FIG. 4. (l,l)-entries divided by N of the computed diagonal Fisher information matrices (H), ( |li| ) and ( |l7| ) for N = 2,4 and 
6, respectively. The value at r = .9 is greatest for N = 6 and least for N = 2. 



3. Modified Gill-Massar traces based on the Yuen-Lax (maximal monotone) and quasi- 



information matrices 



In sec. [IIC, we defined the Gill-Massar trace as the trace of the product of the inverse of the quantum Helstrom 



information matrix and the Fisher information matrices we had computed (sec. [II A) based on the optimal (in terms of 
fidelity) measurements of Vidal et al Q| for N — 2, ... ,7. Now the quantum Helstrom information matrix corresponds 
to the use of the minimal monotone (Bures) metric, as well as the symmetric logarithmic derivative. Now, we replace 
this with the maximal monotone metric, corresponding to the right logarithmic derivative ^, eq. (4.27)], associated 
with Yuen and Lax This can be accomplished by using g(s) = (1 + s)/(2s) in the (diagonal/orthogonal) metric 
tensor ( fill ) rather than g(s) = (which gives the quantum Helstrom information matrix). Then, we find that in 
the pure state limit (r — ► 1) the values of the so- modified traces are exactly N — 1 — rather than 2N — 1 — for all 
our six cases N — 2, ... ,7. For N — 2, this is 



GMo 



2r 2 



(46) 



for N = 4, 



G~M 4 =i2 <y87 ~ ^ + 107 * 4 )' 



(47) 
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and for N = 6, 



GM 6 = ^(1425 - 1070r 2 + 307r 4 - 62r 6 ). 

These three functions, scaled by their value at r = 1, that is N — 1, are plotted in Fig. ||. 

Scaled YL-traces 

3f 



2 . 5 



(48) 




1 . 5 



0.2 0.4 0.6 0.8 1 

FIG. 5. Traces — scaled by N — 1 — for TV = 2, 4 and 6 based on the Yuen-Lax/maximal monotone metric analysis. The 
y-intercepts for r = increase with N. 

The traces GM n for IV = 3 and 7 are (threedine) functions of not only r, as previously, but of 6 and <f> as well. For 
TV = 5, we have 



GM 5 



1 

16 



(147 - 96r 2 + 13r 4 



10(r 2 - l) s 



r 2 cos20-2 ; 



(49) 



In the fully mixed state limit (r — > 0), the values of the traces are 3, 5, 7.25, 9.5, 11.875 and 11.1875. 

If we alternatively employ the quasi-Bures metric, using g(s) = es 1 ^ 7 , then, in the pure state limit for N — 2,4 and 
6 we get traces equalling (4 + e)/e w 2.47152, 3 + 8/e w 5.94304 and 5 + 12/e w 9.41455, respectively. (These results 
are intermediate, then, between those for the minimal and maximal monotone metrics.) For r = 0, the corresponding 
outcomes are the same as in the two situations above. In Fig. ||, we plot these three traces scaled by the noted values 
at r = 1. 

Scaled qB-traces 



1 .25 



1.2 



1 . 15 



1 . 1 



1 . 05 



0.2 0.4 0.6 0.8 1 

FIG. 6. Traces — scaled by their values at r — 1 — for N — 2,4 and 6 based on the quasi-Bures monotone metric analysis. 
The j/-intercepts for r = increase with N . 




The curves for N — 2 and 4 intersect at 



.395121. 
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V. CONCLUDING REMARKS 



We have explicitly constructed the 3x3 Fisher information matrices for the optimal measurements of Vidal et al 
jjj for N = 2, . . . , 7, found that they are tightly bounded by (N — l)H q near the pure state boundary, and conjectured 
that they converge from above to ^ times the identity matrix at the fully mixed state (r = 0). As our main finding, 
we have uncovered (sec. Ill C) an interesting (less strict) analogue for non-separable measurements of a "new quantum 
Cramer-Rao inequality" of Gill and Massar eq. (27)]. The possibility of extending it to the cases N > 7 appears 
to be a challenging problem. Also, the development of optimal measurement schemes for multiple copies of m-level 
systems, m > 2, and the subsequent evaluation of their Fisher information characteristics, merits investigation (cf. 
0]). In this regard, we have presented in sec. HID 3 additional evidence — for an optimal measurement we devised 
for the case m = 3, N = 2 — that has led us to the conjecture that for optimal non-separable measurements of N 
copies of m-level quantum systems, the "Gill-Massar trace" equals (2N — l)(m — 1) in the pure state limit for all m 
and N. 

Additionally, it would be of interest to study the Fisher information matrices associated with optimal measurements 
based on continuous oproms p7j, p. 386] |48|. The relation between optimal measurements (sec. Ill) and universal 
quantum coding (sec. IV B) — both involving averaging with respect to isotropic prior probability distributions by 
projecting onto total spin eigenstates — appears to be worthy of further consideration. (Fischer and Freyberger 
recently compared the use of single adaptive measurements — which possess certain practical advantages — with the 
use of non-separable ones 

We have also investigated here several related topics, all pertaining to the information-theoretic properties of the 
two-level quantum systems. We have posed the problem of constructing an operator-valued probability measure 
(oprom) for the smallest number possible of copies N > 4 which yields the quadrinomial probability distribution (0), 
the Fisher information matrix for which is simply four times the quantum (Helstrom) information matrix (||). Also, 
we discuss in sec. Ill A 6 what appears to be an intriguing connection between our results and the work of Frieden 
p9| concerning differences between classical and quantum information. 
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